Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cudf.options #11193

Merged
merged 18 commits into from
Jul 28, 2022
Merged

Add cudf.options #11193

merged 18 commits into from
Jul 28, 2022

Conversation

isVoid
Copy link
Contributor

@isVoid isVoid commented Jul 1, 2022

This PR adds cudf.options, a global dictionary to store configurations. A set of helper functions to manage the registries are also included. See documentation included in the PR for detail.

See demonstration use in: #11272

Closes #5311

@github-actions github-actions bot added the Python Affects Python cuDF API. label Jul 1, 2022
@isVoid isVoid added feature request New feature or request non-breaking Non-breaking change labels Jul 1, 2022
@isVoid isVoid marked this pull request as ready for review July 1, 2022 23:22
@isVoid isVoid requested a review from a team as a code owner July 1, 2022 23:22
@isVoid isVoid requested review from bdice and charlesbluca July 1, 2022 23:22
@vyasr
Copy link
Contributor

vyasr commented Jul 2, 2022

At a glance, this design looks solid. Before we move forward too far with this, though, given how many different times something like this has been requested it would be good to see how well this works in practice for the use cases that we've consider (offhand, things like requiring stable sorts or forcing exact pandas compatibility for certain APIs come to mind).

@isVoid
Copy link
Contributor Author

isVoid commented Jul 2, 2022

@vyasr certainly. The direct motivation of this PR is #10558. It's currently blocked by #11182 to fully implement what's needed. The stable sort idea sounds useful but I recall the way pandas config those is by choosing the sorting method in the API, not through config options. Is there a more specific stable sort use case you are referring to that I missed?

@vyasr
Copy link
Contributor

vyasr commented Jul 5, 2022

The stable sort was just an arbitrary example. I was referring to the fact that APIs like cudf.merge and DataFrame.drop_duplicates order the results nondeterministically, and we have received many requests to change that. In at least some cases we could achieve deterministic ordering by doing some extra work, but we currently do not. Making that configurable is probably the most common reason that I've heard us discuss something like cudf.config in the past. If #10558 is the immediate goal, I'm just asking to see an example (perhaps in another branch based on this one) of how cudf.config would work for such a use case. It doesn't have to be sorting.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments. I'm confused about why this deviates so strongly from the design of Pandas' options. I would try to align with that if at all possible, rather than coming up with an entirely different structure and naming scheme.

docs/cudf/source/api_docs/config.rst Outdated Show resolved Hide resolved
docs/cudf/source/developer_guide/config.md Outdated Show resolved Hide resolved
python/cudf/cudf/config.py Outdated Show resolved Hide resolved
python/cudf/cudf/config.py Outdated Show resolved Hide resolved
python/cudf/cudf/config.py Outdated Show resolved Hide resolved
python/cudf/cudf/config.py Outdated Show resolved Hide resolved
python/cudf/cudf/config.py Outdated Show resolved Hide resolved
python/cudf/cudf/tests/test_config.py Outdated Show resolved Hide resolved
python/cudf/cudf/config.py Outdated Show resolved Hide resolved
@vyasr
Copy link
Contributor

vyasr commented Jul 7, 2022

A few comments. I'm confused about why this deviates so strongly from the design of Pandas' options. I would try to align with that if at all possible, rather than coming up with an entirely different structure and naming scheme.

Part of me agrees with this, but just to play devil's advocate: do we anticipate having any overlap in the actual options that we offer? If not, it might be rather confusing to have something that looks like the same API but actually has no shared behavior since it's a completely different set of configuration options.

@shwina
Copy link
Contributor

shwina commented Jul 7, 2022

do we anticipate having any overlap in the actual options that we offer?

Potentially? For example, we currently piggyback off of Pandas' options context to control display behavior. There's no reason the user couldn't control the beahvior of Pandas and cuDF separately.

@isVoid
Copy link
Contributor Author

isVoid commented Jul 13, 2022

Summarizing offline sync: We want to model cudf configuration interface close to pandas options interface to provide a more coherent experience for people who transfer from pandas. As of today, the options that pandas support for compute and I/O engines are incompatible with the nature of a GPU dataframe library. Thus we don't model cudf.options to be overlapping with pandas options set, but as an independent set. The first iteration will aim to provide a minimalistic (yet coherent with pandas) user interface.

@codecov
Copy link

codecov bot commented Jul 14, 2022

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.08@ae1b581). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             branch-22.08   #11193   +/-   ##
===============================================
  Coverage                ?   86.43%           
===============================================
  Files                   ?      144           
  Lines                   ?    22808           
  Branches                ?        0           
===============================================
  Hits                    ?    19714           
  Misses                  ?     3094           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ae1b581...438003c. Read the comment docs.

@isVoid isVoid requested a review from bdice July 14, 2022 05:11
@isVoid isVoid changed the title Add cudf.config Add cudf.options Jul 14, 2022
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feedback attached. I am glad the decision was made to stick close to what Pandas provides here - I have a few suggestions for improved alignment.

docs/cudf/source/developer_guide/index.md Outdated Show resolved Hide resolved
docs/cudf/source/user_guide/config.md Outdated Show resolved Hide resolved
docs/cudf/source/user_guide/config.md Outdated Show resolved Hide resolved
docs/cudf/source/user_guide/config.md Outdated Show resolved Hide resolved
python/cudf/cudf/__init__.py Outdated Show resolved Hide resolved
python/cudf/cudf/options.py Outdated Show resolved Hide resolved
python/cudf/cudf/options.py Outdated Show resolved Hide resolved
python/cudf/cudf/options.py Outdated Show resolved Hide resolved
python/cudf/cudf/tests/test_options.py Outdated Show resolved Hide resolved
python/cudf/cudf/tests/test_options.py Outdated Show resolved Hide resolved
@isVoid isVoid requested a review from charlesbluca July 19, 2022 00:45
Comment on lines +74 to +79
def _build_option_description(name, opt):
return (
f"{name}:\n"
f"\t{opt.description}\n"
f"\t[Default: {opt.default}] [Current: {opt.value}]"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make more sense to implement a custom __str__ on Option?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be alright to do that, but the Option doesn't know its own name, so it's missing a crucial piece of information. I'm happy with the current state here.

Copy link
Contributor

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One feature of pandas options functions that I don't think it's publicized well is that you can pass a string that can regex match an option name e.g.

pd.get_option('my') would be equivalent to pd.get_option('my_option') if 'my' could only match 'my_option'

Personally, I am not fond of this feature as it's not explicit and not sure if there's a benefit for increased code complexity, but noting here if you want to include that functionality.

@bdice
Copy link
Contributor

bdice commented Jul 19, 2022

One feature of pandas options functions that I don't think it's publicized well is that you can pass a string that can regex match an option name [...]

I agree that's not necessary here, for simplicity's sake. We can align with the pandas design in lots of ways but regex matching feels like overkill unless/until we need it (e.g. if we have dozens of options and option namespaces with dot . separators).

Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor wording, but looks good, thanks!

docs/cudf/source/developer_guide/options.md Outdated Show resolved Hide resolved
docs/cudf/source/developer_guide/options.md Outdated Show resolved Hide resolved
docs/cudf/source/developer_guide/options.md Outdated Show resolved Hide resolved
Co-authored-by: Charles Blackmon-Luca <[email protected]>
Co-authored-by: Lawrence Mitchell <[email protected]>
@isVoid isVoid requested a review from charlesbluca July 22, 2022 22:50
@shwina
Copy link
Contributor

shwina commented Jul 25, 2022

rerun tests

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggested refactors and typos/grammar fixes.

docs/cudf/source/user_guide/options.md Outdated Show resolved Hide resolved
docs/cudf/source/developer_guide/options.md Outdated Show resolved Hide resolved
docs/cudf/source/developer_guide/options.md Outdated Show resolved Hide resolved
docs/cudf/source/user_guide/options.md Show resolved Hide resolved
python/cudf/cudf/__init__.py Outdated Show resolved Hide resolved
python/cudf/cudf/options.py Outdated Show resolved Hide resolved
python/cudf/cudf/tests/test_options.py Outdated Show resolved Hide resolved
python/cudf/cudf/tests/test_options.py Outdated Show resolved Hide resolved
Co-authored-by: Bradley Dice <[email protected]>
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@isVoid
Copy link
Contributor Author

isVoid commented Jul 26, 2022

Note that I have ported several review comments from @vyasr at #11272

@isVoid
Copy link
Contributor Author

isVoid commented Jul 28, 2022

rerun tests

@shwina
Copy link
Contributor

shwina commented Jul 28, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit e4ce301 into rapidsai:branch-22.08 Jul 28, 2022
@jakirkham
Copy link
Member

Thanks all! 🙏

rapids-bot bot pushed a commit that referenced this pull request Aug 1, 2022
This PR introduces a cudf option to allow user to control the default bitwidth for integer and floating types. The first iteration only plans to provide three options: `None`, 32bit and 64bit. When set as `None`, that means the result dtype will align with what pandas constructs. Otherwise, default to what user specifies.

"Default" implies that it should only affects places that requires type inference, that includes:

- CSV/JSON readers when dtypes are not specified
- cuDF constructors
- Materializing a range index.

This PR is the first demonstration use of `cudf.option`, depending on #11193. Diff will reduce once it's merged.

closes #11182 #10318

Authors:
  - Michael Wang (https://github.com/isVoid)

Approvers:
  - Ashwin Srinath (https://github.com/shwina)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #11272
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] A cudf.config module to manage configuration options
8 participants